In a conventional document analysis system and document adaptation system, a layout is analyzed using a section strength of document description elements, and display regions are allocated to the components of the analyzed layout to display information of the components under desired displaying condition such as an enlarged image in respective display regions, and to selectively display titles of the components in respective display regions, in order to realize a display of a structured/semi-structured document under desired display conditions while maintaining a layout thereof (see Japanese Laid-Open Patent Application JP-P2001-184344A).
The document description element is an element which is a description unit of the structured/semi-structured document, exemplified by an element of the HTML tag such as a TABLE element and an A element in the HTML document. The layout component is a partial region including the display of related information to compose a part of a display screen, indicating a partial region made by related information of a certain headline in the HTML document for example.
Moreover, in order to generate a document applicable to the screen display, an index document is generated from the document description elements with a specific name in accordance with a rule using the name of the document description elements, and a document which describes index item contents is generated (see Japanese Laid-Open Patent Application JP-A-Heisei, 9-251457).
Furthermore, in order to generate a document desired by a user, a composite document made by necessary information is generated in accordance with the URL of the structured/semi-structured document, reference to document description elements indicating a part in which necessary information of respective documents exists, and a rule related to a region to display the necessary information (see Japanese Laid-Open Patent Application JP-P2004-139275A).
In relation to the present invention, Japanese Laid-Open Patent Application JP-A-Heisei, 10-289250 discloses a technique to allow intuitive recognition of a page exhibited in a registered URL page by displaying not only title information but also image information when a list of registered URLs is displayed.
Japanese Laid-Open Patent Application JP-A-Heisei, 11-203285A discloses a technique to determine a line property indicating a position of a document element within a line for respective lines and determine a meaning of the document element for respective lines on the basis of a meaning of each of morphemes to compose the document element and a line property of a line to which the document element belongs, so as to give a precise meaning to respective document elements of the original document.
Japanese Laid-Open Paten Application JP-P2003-85159A discloses a technique to prepare an index automatically by analyzing a top document of a group of desired structured documents and compose the index with image data of a related document, in order to present a document which is easy to read to the user.
Japanese Laid-Open Paten Application JP-P2004-86855 discloses a technique to facilitate preparation and editing of the document referring to contents and index of the document mutually. To be more specific, in this known technique, a link to generate document content information corresponding to index items is embedded when a document index is generated. Thereafter, the document content information containing the index items is generated by indicating the link. In generating the information, a link to instruct an output of the index is embedded in the document content information. The index containing the index items corresponding to the document contents is generated by indicating the link in the document content information. In this case, the links to generate the document content information corresponding to the index items are embedded in the index.
Japanese Laid-Open Paten Application JP-P2003-288334 discloses a technique to generate a structured document with attached tags from a printed document composed of a plurality of pages with high accuracy.
Japanese Laid-Open Paten Application JP-P2003-330856 discloses a technique to allow an improved access to both local information and global information of contents by generating a layout and adjusting the information granularity dynamically in accordance with an operation to modify a zoom factor.
The first problem in the conventional techniques is that it is often impossible to analyze the layout of the structured/semi-structured document intended by a document provider by the conventional document analysis systems. It is because the layout intended by the document provider cannot be analyzed in a layout analysis using the strength of the section of the document description elements due to the variety of document description formats.
The second problem in the conventional techniques is that only a part of the titles of the structured/semi-structured documents can be analyzed in conventional document analysis systems. It is because the title of the structured/semi-structured document is usually expressed by a name, property, style and content of the document description element, and the conventional title analysis based on a rule which uses only the name of the document description element can not analyze the entire title.
The third problem in the conventional techniques is that a third person can not use the analyzed layout information for developing application software by using the conventional document analysis system of structured/semi-structured documents. It is because the conventional document analysis system does not output analyzed layout information in a format which can be utilized by a third person.
The fourth problem in the conventional techniques is the difficulty to make documents adapt to the environments of networks, terminals or users in accordance with the logical structure of the document which the document provider intended. It is because only a part of the title can be analyzed when the document index is generated in accordance with the rule using the name of the document description element, so that it is impossible to generate the index document precisely. Moreover, in the case of generating the composite document in accordance with a rule defined by the user using a URL (uniform resource locator) of the document and reference to the document description element indicating a part in which necessary information of the document exists, there is a case that the composite document desired by the user cannot be generated precisely at the time of renewing the document, and the rule as described above prevents the logical structure of the document intended by the document provider from being represented precisely.